Scholarship project
  • Introduction
  • Data Cleaning
  • EDA
  • Analysis
    • Overview
    • RCP8.5
    • RCP4.5
  • Furture Actions

On this page

  • Introduction
  • Feature Selection
  • Basic Analysis 4.5 vs 8.5
    • Scatterplot
      • Useful Variables
    • Pearson Correlation
      • RCP4.5
      • RCP8.5
    • RMSE
  • Conclusion

Dataset Analysis

Author

JaeHo Bahng

Published

May 19, 2024

Introduction

Before we dive into scenarios, lets conduct a simple comparison on the overall trend of RCP 4.5 and 8.5. What variables influence annual temperature the most when grouped into two RCP scenarios as a whole?

Methodology

We will be using scatterplots and pearson correlation and RMSE to find similarities and differences between the two RCP scenarios.

Import module / Set options and theme
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import xml.etree.ElementTree as ET
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import ttest_rel
from statsmodels.stats.weightstats import ttest_ind
import numpy as np
import pingouin as pg
from scipy.stats import zscore
import plotly.graph_objects as go
import pandas as pd
from plotly.subplots import make_subplots
import warnings
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import plotly.express as px
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

warnings.filterwarnings("ignore")

pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 10)

Feature Selection

When trying to find correlations between Annual Temperature, the temperature related features such as Tmin, Tmax, T_Seasonal always turned out to have a high correlation with annual temperature and therefore acted as a hinderance for proper analysis. I removed all temperature related variables to clearly investigate what properties affect the temperature.

Import cleaned data
df = pd.read_csv('../data/cleaned_df.csv')
df['Location_ID'] = df.groupby(['long', 'lat']).ngroup() + 1

group_list = ['Park', 'long', 'lat', 'veg', 'year', 'TimePeriod', 'RCP','treecanopy', 'Ann_Herb', 'Bare', 'Herb', 'Litter', 'Shrub', 'El', 'Sa','Cl', 'RF', 'Slope', 'E', 'S']
veg_location = df.drop(labels='scenario',axis=1).groupby(group_list).mean().reset_index()
# veg_location['T_Annual'] = (veg_location['T_Annual'] - veg_location['T_Annual'].min()) / (veg_location['T_Annual'].max() - veg_location['T_Annual'].min())


# Average Scenario Dataset
# Convert to numeric, coercing errors to NaN
numeric_series = pd.to_numeric(veg_location['RCP'], errors='coerce')

numeric_series

# Fill NaNs with original non-numeric values
veg_location['RCP'] = numeric_series.fillna(veg_location['RCP'])

four = veg_location[veg_location['RCP'].isin([4.5])]
eight = veg_location[veg_location['RCP'].isin([8.5])]
four_h = veg_location[veg_location['RCP'].isin(['historical'])]
four_h['RCP'] = 4.5
eight_h = veg_location[veg_location['RCP'].isin(['historical'])]
eight_h['RCP'] = 8.5

df_con = pd.concat([four_h, four, eight_h, eight], ignore_index=True)
df_con['Location_ID'] = df_con.groupby(['long', 'lat']).ngroup() + 1


# Scenario Dataset
# Convert to numeric, coercing errors to NaN
numeric_series = pd.to_numeric(df['RCP'], errors='coerce')

numeric_series

# Fill NaNs with original non-numeric values
df['RCP'] = numeric_series.fillna(df['RCP'])

four = df[df['RCP'].isin([4.5])]
eight = df[df['RCP'].isin([8.5])]
four_h = df[df['RCP'].isin(['historical'])]
four_h['RCP'] = 4.5
eight_h = df[df['RCP'].isin(['historical'])]
eight_h['RCP'] = 8.5

df_orig = pd.concat([four_h, four, eight_h, eight], ignore_index=True)
df_orig['Location_ID'] = df_orig.groupby(['long', 'lat']).ngroup() + 1

selected_columns = [col for col in df.columns if not col.startswith(('T_', 'Tmin', 'Tmax'))]
dropped_columns = [col for col in df.columns if col.startswith(('T_', 'Tmin', 'Tmax'))]
filtered_df = df_orig[selected_columns]
filtered_df['T_Annual'] = df_orig['T_Annual']

df_orig = filtered_df

print("Dropped Columns : ", dropped_columns)
Dropped Columns :  ['T_P_Corr', 'T_Winter', 'T_Spring', 'T_Summer', 'T_Fall', 'Tmax_Winter', 'Tmax_Spring', 'Tmax_Summer', 'Tmax_Fall', 'Tmin_Winter', 'Tmin_Spring', 'Tmin_Summer', 'Tmin_Fall', 'T_Annual']

Basic Analysis 4.5 vs 8.5

Scatterplot

With a basic scatterplot, we can see basic correlations of how each numerical variable correlates to either the annual temperature or the annual percipitation. Since RCP 8.5 and RCP 4.5 have different predictions, two plots were used for each scenario.

Firstly, without an additional feature, we can see that the more percipitation, the lower the annual temperature because we can easily draw a line with a negative slope through the scaterred plots.

4.5 vs 8.5 scatterplot
# Assuming df_con is your DataFrame and is already loaded
# List of columns to use for coloring
test = df_con.iloc[:,list(range(1, 3))+ [4,6] + list(range(8, len(df_con.columns)-1))]
color_columns = list(test.columns)
rcp_values = test['RCP'].unique()

subplot_titles = [f'RCP {rcp}' for rcp in rcp_values]

# Create figure with subplots for each RCP value
fig = make_subplots(rows=1, cols=len(rcp_values), shared_yaxes=True, subplot_titles=subplot_titles, horizontal_spacing=0.15)

# Add a scatter trace for each color column and each RCP value
for i, col in enumerate(color_columns):
    for j, rcp in enumerate(rcp_values):
        fig.add_trace(
            go.Scatter(
                x=test[(test['year'].isin(range(2060, 2100))) & (test['RCP'] == rcp)]['PPT_Annual'],
                y=test[(test['year'].isin(range(2060, 2100))) & (test['RCP'] == rcp)]['T_Annual'],
                mode='markers',
                marker=dict(
                    color=test[(test['year'].isin(range(2060, 2100))) & (test['RCP'] == rcp)][col],
                    colorbar=dict(
                        # title='Scale',
                                  tickmode='array',
                                  tickvals=[round(i,2) for i in np.linspace(start=round(min(test[(test['year'].isin(range(2060, 2100)) & (test['RCP'] == rcp))][col]),2),stop=round(max(test[(test['year'].isin(range(2060, 2100)) & (test['RCP'] == rcp))][col]),2),num=5)],
                                  ticktext=[round(i,2) for i in np.linspace(start=round(min(test[(test['year'].isin(range(2060, 2100)) & (test['RCP'] == rcp))][col]),2),stop=round(max(test[(test['year'].isin(range(2060, 2100)) & (test['RCP'] == rcp))][col]),2),num=5)],
                                  y=0.5,
                                  x= 0.43 + (j*0.58)
                                  ),
                                  colorscale='rdpu'
                ),
                name=col,
                visible=True if i == 0 else False,
                hovertemplate=(
                    f"<b>{col}</b><br>"
                    "Precipitation: %{x}<br>"
                    "Temperature: %{y}<br>"
                    "RCP: " + str(rcp) + "<br>"
                    "Value: %{marker.color}<br>"
                    "<extra></extra>"
                  )  # This hides the secondary box with trace info  # Only the first trace is visible initially
            ),
            row=1, col=j+1
        )

# Updating the layout to add the title
fig.update_layout(
    title={
        'text': '<b>Annual Precipitation vs Temperature by RCP Scenarios</b>',
        'x': 0.5,
        'y': 0.97,
        'xanchor': 'center'
    },
    # title_font=dict(size=20),
    showlegend=False  # Hide legend since we are using colorbars
)

# Adding dropdown filter to change visible trace
dropdown_buttons = [
    {
        'label': col,
        'method': 'update',
        'args': [
            {
                'visible': [col == color_column for color_column in color_columns for _ in rcp_values]
            },
            {
                'title': {'text': f'<b>Annual Precipitation vs Temperature by {col}</b>', 'x':0.5, 'y':0.97},
                'marker': {'colorbar': {'title': 'Scale'}}
            }
        ]
    }
    for col in color_columns
]

fig.update_layout(
    updatemenus=[
        {
            'buttons': dropdown_buttons,
            'direction': 'down',
            'showactive': True,
            'x': 0.5,
            'xanchor': 'center',
            'y': 1.19,
            'yanchor': 'top'
        }
    ]
)

fig.update_xaxes(title_text="Annual Precipitation", row=1, col=1)
fig.update_yaxes(title_text="Annual Temperature", row=1, col=1)
fig.update_xaxes(title_text="Annual Precipitation", row=1, col=2)

for annotation in fig['layout']['annotations']:
    annotation['font'] = {'size': 12, 'color': 'black'}

# Show the figure
fig.show()

Useful Variables

By trying each numerical variable as a color metric for the scatter plots, four important features that I found are as below:

  1. VWC_Summer_whole
  2. WetSoilDays_Spring_whole
  3. SWA_Fall_whole
  4. Bare

For now, rather than focusing on the seasonality of the variables, lets focus on VWC, WetSoilDays, SWA and Bare. We can infer that it is important to keep the land moist to a certain level and minimize the ‘bareness’ in the area to lower the temperature. Lets move on to each RCP scenario for a more detailed analysis.

Important Features
feature = 'VWC_Summer_whole'
fig = px.scatter(df_con[df_con['year'].isin(range(2060,2099))], x="PPT_Annual", y="T_Annual",
                 color=feature, facet_col="RCP",     
                 labels={
        feature: 'Scale'  # Replace with your desired colorbar title
    })
fig.update_layout(
    title={
        'text': f'<b>Annual Precipitation/Temperature ({feature})</b>',
        # 'y':0.95,  # This adjusts the position of the title (vertically)
        'x':0.5,   # Centers the title horizontally
        'xanchor': 'center',  # Ensures the title is centered at the specified x position
        # 'yanchor': 'top'  # Ensures the title is positioned based on the top of the text
    },
    title_font=dict(size=20)  # Custom font settings
)
fig.show()


feature = 'WetSoilDays_Spring_whole'
fig = px.scatter(df_con[df_con['year'].isin(range(2060,2099))], x="PPT_Annual", y="T_Annual",
                 color=feature, facet_col="RCP",     
                 labels={
        feature: 'Scale'  # Replace with your desired colorbar title
    })
fig.update_layout(
    title={
        'text': f'<b>Annual Precipitation/Temperature ({feature})</b>',
        # 'y':0.95,  # This adjusts the position of the title (vertically)
        'x':0.5,   # Centers the title horizontally
        'xanchor': 'center',  # Ensures the title is centered at the specified x position
        # 'yanchor': 'top'  # Ensures the title is positioned based on the top of the text
    },
    title_font=dict(size=20)  # Custom font settings
)

fig.show()

feature = 'SWA_Fall_whole'
fig = px.scatter(df_con[df_con['year'].isin(range(2060,2099))], x="PPT_Annual", y="T_Annual",
                 color=feature, facet_col="RCP",     
                 labels={
        feature: 'Scale'  # Replace with your desired colorbar title
    })
fig.update_layout(
    title={
        'text': f'<b>Annual Precipitation/Temperature ({feature})</b>',
        # 'y':0.95,  # This adjusts the position of the title (vertically)
        'x':0.5,   # Centers the title horizontally
        'xanchor': 'center',  # Ensures the title is centered at the specified x position
        # 'yanchor': 'top'  # Ensures the title is positioned based on the top of the text
    },
    title_font=dict(size=20)  # Custom font settings
)

fig.show()

Pearson Correlation

What is Pearson Correlation? Pearson correlation is a measure of the linear relationship between two continuous variables. It quantifies the degree to which a change in one variable predicts a change in another variable, with values ranging from -1 to +1. A value of +1 indicates a perfect positive linear relationship, where an increase in one variable corresponds to a proportional increase in the other. A value of -1 indicates a perfect negative linear relationship, where an increase in one variable corresponds to a proportional decrease in the other. A value of 0 indicates that there is no linear relationship between the variables.

RCP4.5

By calculating a Pearson correlation matrix for all numerical variables in the dataset, we can create a heatmap to explore the relationships between different features. This visualization helps identify which features affect each other and highlights those that correlate most strongly with the annual temperature.

In the second plot below which is a plot for features sorted for most correlated with T_Annual, we see that the highest correlators are:

  • Extreme Short Term dry stress
  • Frost days

While these factors cannot be directly influenced, some takeaways from the plot would be that we can take action on the following factors which were the next highly ranked groups to mitigate temperature rise:

  • PET (Potential Evapotranspiration)
  • SWA (Soil Water Availability)
  • VWC (Volumetric Water Content)
Correlation Heatmap
# Calculate the correlation matrix
corr_matrix = df_orig[(df_orig['TimePeriod']!='Hist') & (df_orig['RCP']==4.5)].iloc[:,8:].corr()


# Create an interactive heatmap of the correlation matrix
fig = px.imshow(corr_matrix,
                # text_auto=True,  # Automatically add text in each cell
                labels=dict(color="Correlation"),
                x=corr_matrix.columns,
                y=corr_matrix.columns,
                color_continuous_scale='RdBu_r'
  )  # Red-Blue color map, reversed

fig.update_layout(
    width=800,  # Width of the figure in pixels
    height=900,  # Height of the figure in pixels
    margin=dict(l=10, r=1, t=50, b=10)  # Reducing margins around the plot
)

# Adjusting color bar position
# fig.update_layout(coloraxis_colorbar=dict(
#     x=0.8  # Adjusts the horizontal position of the color bar
# ))
fig.update_layout(title_text='<b>Future Correlation Heatmap : RCP 4.5</b>',  # Bold text using HTML
                  title_x=0.5)  # Centers the title by setting the x position to 0.5

fig.update_xaxes(tickfont=dict(size=10))  # Sets the font size of x-axis labels
fig.update_yaxes(tickfont=dict(size=10))  # Sets the font size of y-axis labels
# fig.update_xaxes(side="bottom")
fig.show()
Correlation Feature Importance
# Sort the features by the absolute value of the loading for PCA1
sorted_loadings = corr_matrix['T_Annual'].abs().sort_values(ascending=False)

# Get the top 10 most influential features
top_features = sorted_loadings.head(20).index

# Get the actual loadings for these top 10 features
top_loadings = round(corr_matrix['T_Annual'].loc[top_features,],4)[1:]

# Create a color list based on the sign of the loadings
colors = ['blue' if val > 0 else 'red' for val in top_loadings]

# Create a bar chart
fig = go.Figure(data=[go.Bar(
    x=top_loadings.index,
    y=top_loadings.abs(),
    text=top_loadings.values,  # Show the actual values as text
    textposition='inside',
    marker_color=colors,
    showlegend=False
)])

# Add legend manually
fig.add_trace(go.Bar(
    x=[None], y=[None],
    marker=dict(color='blue'),
    showlegend=True,
    name='Positive'
))
fig.add_trace(go.Bar(
    x=[None], y=[None],
    marker=dict(color='red'),
    showlegend=True,
    name='Negative'
))

# Update layout for better readability
fig.update_layout(
    title='<b>Top Features Correlating to Annual Temperature (RCP = 4.5)</b>',
    xaxis_title='Features',
    yaxis_title='Absolute PCA1 Loadings',
    yaxis=dict(tickformat=".2f"),
    xaxis=dict(
        tickangle=45,  # Rotate x ticks
        tickfont=dict(size=10)  # Make the font smaller
    ),
    template='plotly_white',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    margin=dict(l=20, r=20, b=20, t=60)
)

fig.show()

RCP8.5

In the analysis of Pearson correlation for different RCP (Representative Concentration Pathway) scenarios, it was observed that the ranking of features changed between the RCP 4.5 and RCP 8.5 scenarios. Here’s a detailed comparison:

Similarities:

  1. Higher ranked features are extreme short term dry stress and Frost Days
  2. Lower ranked features are VWC, SWA and PET

Differences:

  1. Correlation gap between higher ranked and lower Ranked Features

    • RCP 4.5 : The transition from higher to lower ranked features seem smooth on the plot and highest ranked VWC feature has a correlation of -0.37.
    • RCP 8.5 : There is an abrupt change when shifting to lower ranked features and the first VWC feature has a correlation of -0.26 which is significantly lower than the RCP 4.5 scenario.
  2. RCP 8.5 Scenario:

    • Correlation decrease in VWC and SWA Features: The number of VWC and SWA features significantly decreased, and their values decreased significantly compared to RCP4.5 correlations. Furthermore, there is a visual drop between PET_Spring and Evap_Spring whereas there was a smooth transition for RCP4.5.

Implications:

  1. RCP 4.5: This scenario suggests room for human actions to influence annual temperature.
  2. RCP 8.5: Under this more extreme scenario, unchangable variables become more dominant, overshadowing the influence of water-related variables, therefore implying that it is harder to influence temperature.

These differences highlight how the impact of climate variables can shift under different greenhouse gas concentration pathways, with temperature effects becoming more pronounced under higher emission scenarios.

What does this mean?
With the knowledge that 8.5 means higher CO2 emission as mentioned in the EDA tab, we can imply that if the RCP8.5 scenario takes place, there is less room for people to prevent temperatures from rising and preserving the environment. This is a logical statement even without the data but the data reinforces the idea by showing less variables in the pearson correlation with the annual temperature.

Correlation Heatmap
# Calculate the correlation matrix
corr_matrix = df_orig[(df_orig['TimePeriod']!='Hist') & (df_orig['RCP']==8.5)].iloc[:,8:].corr()


# Create an interactive heatmap of the correlation matrix
fig = px.imshow(corr_matrix,
                # text_auto=True,  # Automatically add text in each cell
                labels=dict(color="Correlation"),
                x=corr_matrix.columns,
                y=corr_matrix.columns,
                color_continuous_scale='RdBu_r'
  )  # Red-Blue color map, reversed

fig.update_layout(
    width=800,  # Width of the figure in pixels
    height=900,  # Height of the figure in pixels
    margin=dict(l=10, r=1, t=50, b=10)  # Reducing margins around the plot
)

# Adjusting color bar position
# fig.update_layout(coloraxis_colorbar=dict(
#     x=0.8  # Adjusts the horizontal position of the color bar
# ))
fig.update_layout(title_text='<b>Future Correlation Heatmap : RCP 8.5</b>',  # Bold text using HTML
                  title_x=0.5)  # Centers the title by setting the x position to 0.5

fig.update_xaxes(tickfont=dict(size=10))  # Sets the font size of x-axis labels
fig.update_yaxes(tickfont=dict(size=10))  # Sets the font size of y-axis labels
# fig.update_xaxes(side="bottom")
fig.show()
Correlation Feature Importance
# Sort the features by the absolute value of the loading for PCA1
sorted_loadings = corr_matrix['T_Annual'].abs().sort_values(ascending=False)

# Get the top 10 most influential features
top_features = sorted_loadings.head(20).index

# Get the actual loadings for these top 10 features
top_loadings = round(corr_matrix['T_Annual'].loc[top_features,],4)[1:]

# Create a color list based on the sign of the loadings
colors = ['blue' if val > 0 else 'red' for val in top_loadings]

# Create a bar chart
fig = go.Figure(data=[go.Bar(
    x=top_loadings.index,
    y=top_loadings.abs(),
    text=top_loadings.values,  # Show the actual values as text
    textposition='inside',
    marker_color=colors,
    showlegend=False
)])

# Add legend manually
fig.add_trace(go.Bar(
    x=[None], y=[None],
    marker=dict(color='blue'),
    showlegend=True,
    name='Positive'
))
fig.add_trace(go.Bar(
    x=[None], y=[None],
    marker=dict(color='red'),
    showlegend=True,
    name='Negative'
))

# Update layout for better readability
fig.update_layout(
    title='<b>Top Features Correlating to Annual Temperature</b>',
    xaxis_title='Features',
    yaxis_title='Absolute PCA1 Loadings',
    yaxis=dict(tickformat=".2f"),
    xaxis=dict(
        tickangle=45,  # Rotate x ticks
        tickfont=dict(size=10)  # Make the font smaller
    ),
    template='plotly_white',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    margin=dict(l=20, r=20, b=20, t=60)
)

fig.show()

RMSE

To identify the most significant differences between two scenarios, I employed another method: calculating the RMSE (Root Mean Square Error) for each column. Given that both scenarios have the same number of data points, this approach provided an effective way to quantify the differences. Before performing the RMSE calculations, I standardized the data points to ensure consistency.

When analyzing differences using RMSE, precipitation-related features stood out as the most variable between the two RCP scenarios. Interestingly, it wasn’t always the scenario with the highest precipitation that had the lower temperature. A surprising difference was in PPT_Summer where RCP8.5 had higher average percipitation than RCP4.5. T

What does this mean?
This may be a sign that percipitation itself may not be an important factor when predicting the temperature, but may be used as a ratio to some other feature to predict.

RMSE
df_45 = df_orig[df_orig['RCP'] == 4.5]
df_85 = df_orig[df_orig['RCP'] == 8.5]

df1 = df_45.iloc[:,8:-3]
df2 = df_85.iloc[:,8:-3]

# Function to calculate z-scores
def standardize(df):
    return df.apply(zscore)

# Standardize both dataframes
z_df1 = standardize(df1)
z_df2 = standardize(df2)

# Calculate Absolute Difference of Z-Scores
abs_diff_z_scores = np.abs(z_df1 - z_df2)

# Mean Absolute Difference
mean_abs_diff = abs_diff_z_scores.mean()

# RMSE
rmse = np.sqrt(np.mean((z_df1.reset_index(drop=True) - z_df2.reset_index(drop=True))**2, axis=0))

rmse_sort = rmse.sort_values(ascending=False).head(20)

# Create a bar chart
fig = go.Figure(data=[go.Bar(
    x=rmse_sort.keys(),
    y=rmse_sort.values,
    text=[round(i,4) for i in rmse_sort.values],  # Show the actual values as text
    textposition='inside',
    # marker_color=colors,
    showlegend=False
)])

# Update layout for better readability
fig.update_layout(
    title='<b>RCP 4.5 vs 8.5 RMSE of components</b>',
    xaxis_title='Features',
    yaxis_title='Absolute PCA1 Loadings',
    yaxis=dict(tickformat=".2f"),
    template='plotly_white',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    margin=dict(l=20, r=20, b=20, t=60)
)

fig.show()

for i, feature in enumerate(rmse_sort.keys()[:5]):
    print("Rank : ", i+1)
    print(f"RCP 4.5 Average {feature} : ", df_45[feature].mean())
    print(f"RCP 8.5 Average {feature} : ", df_85[feature].mean())
    print("")
Rank :  1
RCP 4.5 Average FrostDays_Summer :  0.00022957468556467173
RCP 8.5 Average FrostDays_Summer :  0.00011478734278233587

Rank :  2
RCP 4.5 Average PPT_Fall :  9.65464850475821
RCP 8.5 Average PPT_Fall :  9.665259282743092

Rank :  3
RCP 4.5 Average PPT_Spring :  6.52676644055382
RCP 8.5 Average PPT_Spring :  6.369110562075354

Rank :  4
RCP 4.5 Average Evap_Spring :  4.403702479181158
RCP 8.5 Average Evap_Spring :  4.269294863702628

Rank :  5
RCP 4.5 Average PPT_Summer :  8.145404406740747
RCP 8.5 Average PPT_Summer :  8.249074486053336

Conclusion

Scatterplot
With the scatterplot, we were able to visually identify features that have strong positive and negative correlations with annual temperature. A few variables that we found were VWC_Summer_whole, WetSoilDays_Spring_whole, SWA_Fall_whole, and Bare

Correlation Heatmap
Firstly, the correlation heatmaps give us a direct insight on which features directly correlate with annual temperature. Extreme Short Term Stress, Frost Days, PET, VWC were overlapping features for both RCP scenarios. However, the difference was that the RCP8.5 correlations had a sudden drop for the VWC variables allowing us to hypothesize that human actions will affect the temperature less when done in a scenario where more CO2 emission is happenning.

RMSE
The RMSE gives us a different perspective to features that correlate with annual temperature. For example, by calculating RMSE for each column against each other for different RCP scenarios, we know that percipitation is very different for the two scenarios. However, the higher temperature scenario also has higher percipitation which is contradicting to common logic. This may be a sign that percipitation itself may not be an important factor when predicting the temperature, but may be used as a ratio to some other feature to predict.